1 Problem Statement 


Suppose there are n seats around a circular table with numbers 1,2 ,n. In each trial, we 
randomly choose a seat from them and then select k seats ( for some pre-defined number 
k < n ) from the selected one counterclockwise and mark them as selected. Then, what is 
the expected number of trials so that all the n seats get selected ? 

2 Solution Approach 

Initially, it seems that the problem can be attacked with straight forward probability and 
calculations. But, the problem that arises is that, when the overlap occurs, it becomes harder 
to handle the situation. 

We also tried for recurrence relations but the problem breaks down to different cases and a 
lot of constraints come up. 

So, as a very general thought, we wanted to do some experiments, looked at the experi¬ 
mental values, analyze it from a statistical perspective and tried to fit a model to it to get 
a idea about the empirical value of the expectation. 

3 Having Some Ideas About Relation With n and k 

• Setup 

Let, y n k denote the required expectation. We want to analyze the behaviour separately 
for n and k. Then we will club them to have some final idea. 

3.1 Finding change due to n when k is fixed 

As mentioned, first we fix a k .We write down a simple C program, that performs the 
experiment for a fixed value of n 500 times and then take their average which must be a 
good estimator for that fixed value of n. The program performs this for 3000 values of n ( 
namely from k to k + 3000 ) and stores that in a file. Then we scan the data in R and plot 
that and the plot y n k vs n for the following values of k looked like this : 
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• Observation 


Observe that, for k = 15 & 20, i.e. for small values of k, the plot shows something quadratic 
but for k = 200, the plot is quite linear. Now, it might be the case that, for high values of 
k, the quadratic curvature is so diminished that it looks like linear. So, we must take both 
of them into consideration. 

• Conclusion 

We can conclude that, y nk is proportional to n or n 2 when k is fixed. 

3.2 Finding change due to k when n is fixed 

As mentioned, first we fix a n .Then, the values of k can go from 1 to n. Again, we write down 
a C program the performs experiments 500 times and take the average as a good estimator 
of the expectation for every value of k and do the same for different values of n. Then we 
scan the data in R and plot that and the plot y n k vs k for n = 500 looked like this : 
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• Observation 

Observe that, the plot looks like the graph of so, we plot again with y nk vs | for n 
= 500, 1000 and 2000 (conterclockwise) and the graph came out this : 
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• Conclusion 

Since the final plot is a clear straight line, we can conclude that, y nk is proportional to 
| when n is fixed. 

4 Fitting Model 

Based on the cases, we found out, the might be a couple of possible models. First we list the 
models and then we will fit them one by one and then analyze the errors and try to come 
up with the best among them. The models may be listed as follows : 

1 fl 

• Model 1 : y nk = y + a.n + /?.- + 7 .- + e nk where e nk ~ (0, a 2 ) 

1 77,2 

• Model 2 : y nk = y + a.n 2 + /3.- + 7 . — + e nfc where e nk ~ (0, a 2 ) 

1 fi^ 1 

• Model 3 : y nk = y + a x .n + a 2 .n 2 + /?.- + 71 .- + 72 . — + e nk where e nk ~ (0, cr 2 ) 

For this purpose, we take 100 values of n, namely n = 1,2,..., 100 and k = 1,..., n. So, 
, , , 100 x 101 

altogether we have ---= 5050 observations. 

4.1 Fitting Model 1 

1 n 

ynk = y + a.n + f3-^ + 7 .^ + e nk , n = 1 , 2 ,..., 100 , k = 1 ,... ,n 
So, we can write the model in vector notation as : 
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So, once we have created the design matrix and collect the observations by experiment, we 
arc all set for fitting a linear model. In R, the fitted model is as follows and the characteristics 
of the fit is given below : 


y nk = -3.0493227 - 0.0270177.n - 29.5619357.| + 5.3222732.^ 

k k 

• Estimated error variance : 5.266276 

• Akaike Information Criteria : 22688.46 

• Bayesian Information Criteria : 22721.09 

4.2 Fitting Model 2 


n 


Vnk = n + a. n +/h- + 7 -y + e nk , n = 1 , 2,..., 100 , k = 1 , 
So, we can write the model in vector notation as : 
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So, once we have created the design matrix and collect the observations by experiment, we 
arc all set for fitting a linear model. In R, the fitted model is as follows and the characteristics 
of the fit is given below : 


1 Tt 

y nk = -1.8139844571 - -0.0005231321.n 2 + 68.0364404730.- + 0.0503612273.— 

k k 

• Estimated error variance : 27.61731 

• Akaike Information Criteria : 31095.41 

• Bayesian Information Criteria : 31128.05 
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4.3 Fitting Model 3 


1 77 / 77,2 

Vnk = H + Oi\-n + a 2 .n 2 + 0.- + ji.- + y 2 . — + e nk , n = 1, 2,..., 100 , k = 1,..., n 
So, we can write the model in vector notation as : 
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So, once we have created the design matrix and collect the observations by experiment, we 
arc all set for fitting a linear model. In R, the fitted model is as follows and the characteristics 
of the fit is given below : 


1 77/ 77,2 

y nk = -1.875756 - 0.051667.n + 0.00010418.n 2 - 10.47418.- + 4.149027.- - 0.0117558.— 

k k k 

• Estimated error variance : 2.831177 

• Akaike Information Criteria : 19594.77 

• Bayesian Information Criteria : 19640.46 

5 Conclusion 

According to the Akaike Information Criteria, we can see that the Model 3 has the best 
fit among the three proposed models. Hence we can take it as an empirical model for our 
purpose. 
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